Manipulating job-folders

IPython is an ingenious combination of a bash-like terminal with a python shell. It can be used for both bash related affairs such as copying files around creating directories, and for actual python programming. In fact, the two can be combined to create a truly powerfull shell.

Alternatively, Jupyter provide an attractive graphical interface for performing data analysis. Or for demonstrating pylada, as in this notebook.

Pylada puts these tools to good use by providing a command-line approach to manipulate job-folders (see the relevant notebook for more information), launch actual calculations, and collect the result. When used in conjunction with python plotting libraries, e.g. matplotlib, it can provide rapid turnaround from conceptualization to result analysis.

Assuming that pylada is installed, the IPython module can be loaded in ipython/Jupyter with:


In [1]:
%load_ext pylada

 Prep

Pylada's IPython interface revolves around job-folders. In order to explore its features, we first need to create job-folders, preferably some which do not involve heavy calculations. The following creates a dummy.py file in the current directory. It contains a dummy functional that does very little work. In actual runs, everything dummy should be replaced with wrappers to VASP, or Quantum Espresso.


In [2]:
%%writefile dummy.py
def functional(structure, outdir=None, value=False, **kwargs):
    """ A dummy functional """
    from copy import deepcopy
    from pickle import dump
    from random import random
    from py.path import local

    structure = deepcopy(structure)
    structure.value = value
    outdir = local(outdir)
    outdir.ensure(dir=True)
    dump((random(), structure, value, functional), outdir.join('OUTCAR').open('wb'))

    return Extract(outdir)


def Extract(outdir=None):
    """ An extraction function for a dummy functional """
    from os import getcwd
    from collections import namedtuple
    from pickle import load
    from py.path import local

    if outdir == None:
        outdir = local()()
    Extract = namedtuple('Extract', ['success', 'directory',
                                     'structure', 'energy', 'value', 'functional'])
    outdir = local(outdir)
    if not outdir.check():
        return Extract(False, str(outdir), None, None, None, None)
    if not outdir.join('OUTCAR').check(file=True):
        return Extract(False, str(outdir), None, None, None, None)
    with outdir.join('OUTCAR').open('rb') as file:
        structure, energy, value, functional = load(file)
        return Extract(True, outdir, energy, structure, value, functional)
functional.Extract = Extract


Overwriting dummy.py

The notebook about creating job folders has more details about this functional. For now, let us create a jobfolder with a few jobs:


In [3]:
from dummy import functional
from pylada.jobfolder import JobFolder
from pylada.crystal.binary import zinc_blende

root = JobFolder()

structures = ['diamond', 'diamond/alloy', 'GaAs']
stuff = [0, 1, 2]
species = [('Si', 'Si'), ('Si', 'Ge'), ('Ga', 'As')]

for name, value, species in zip(structures, stuff, species):
    job = root / name
    job.functional = functional
    job.params['value'] = value
    job.params['structure'] = zinc_blende()
    
    for atom, specie in zip(job.structure, species):
        atom.type = specie

Saving and Loading a job-folder

At this point we have job-folder stored in memory in a python variable. If you were to exit ipython, the job-folder would be lost for ever and ever. We can save it do disk with:


In [4]:
%mkdir -p tmp
%savefolders tmp/dummy.dict root


Saved job folder to /Users/mdavezac/workspaces/pylada-light/src/pylada-light/notebooks/tmp/dummy.dict.

The next time ipython is entered, the job-folder can be loaded from disk with:


In [5]:
%explore tmp/dummy.dict


Loaded job list from /Users/mdavezac/workspaces/pylada-light/src/pylada-light/notebooks/tmp/dummy.dict.

Once a folder has been explored from disk, savefolder can be called without arguments.

The percent(%) sign indicates that these commands are ipython magic-functions. To get more information about what Pylada magic functions do, call them with "--help".


In [6]:
%explore --help


usage: %explore [-h] [--file | --expression] [TYPE] [JOBFOLDER]

Opens a job-folder from file on disk.

positional arguments:
  TYPE          Optional. Specifies what kind of job folders will be explored.
                Can be one of results, errors, all, running. "results" are
                those job folders which have completed. "errors" are those job
                folders which are not "running" at the time of invokation and
                failed somehow. "all" means all job folders. By default, the
                dictionary is read as it was saved. The modified job-folder is
                not saved to disk.
  JOBFOLDER     Job-dictionary variable or path to job folder saved to disk.

optional arguments:
  -h, --help    show this help message and exit
  --file        JOBFOLDER is a path to a job-dictionary stored on disk.
  --expression  JOBFOLDER is a python expression.

Tip: The current job-folder and the current job-folder path are stored in pylada.interactive.jobfolder and pylada.interactive.jobfolder_path. In practice, accessing those directly is rarely needed.

 Listing job-folders

The executable content of the current job-folder (the one loaded via %explore) can be examined with:


In [7]:
%listfolders all


/GaAs/
/diamond/
/diamond/alloy/

This prints out the executable jobs. It can also be used to examine the content of specific subfolders.


In [8]:
%listfolders diamond/*


diamond/alloy 

The syntax is the same as for the bash command-line. When given an argument other than "all", %listfolders list only the matching subfolders, including those which are not executable. In practice, it works like "ls -d".

Executable job-folders are those that are set to go with a functional.

 Navigating the job-folders

The %goto command reproduces the functionality of the "cd" unix command.


In [9]:
%goto /diamond


In diamond, but no corresponding directory on disk.

The current job-folder is now diamond. Were there a corresponding sub-directory on disk, the current working directory would also be diamond. As it is, we have not yet launched the calculations, so no such directory exist. This feature makes it easy to navigate both job-folders and output directories simulteneously.

We can check the subfolders contained within /diamond.


In [10]:
%listfolders


alloy  

And calling %goto without an argument will print out the current location (much like pwd does for directories).


In [11]:
%goto


Current position in job folder: /diamond/
Filename of job-folder:  /Users/mdavezac/workspaces/pylada-light/src/pylada-light/notebooks/tmp/dummy.dict

We can also use relative paths, as well as .. to navigate around the tree structure. Most any path that works for cd will work with %goto as well.


In [12]:
%goto ..
%goto
%listfolders


Current position in job folder: /
Filename of job-folder:  /Users/mdavezac/workspaces/pylada-light/src/pylada-light/notebooks/tmp/dummy.dict
GaAs  diamond  

Examining the executable content of a jobfolder

It is always possible to change the executable data of a job-folder, whether the functional or its parameters. To do this, we must first navigate to the specific subfolder of interest, and then use the object jobparams.current.


In [13]:
%goto /diamond/alloy/
assert jobparams.current.functional == functional


In alloy, but no corresponding directory on disk.

Parameters can be accessed either throught the params dictionary:


In [14]:
jobparams.current.params.keys()


Out[14]:
dict_keys(['structure', 'value'])

Or directly as attributes of jobparams.current:


In [15]:
assert jobparams.current.value == 1

 Simultaneously examining/modify parameters for many jobs at a time

It is likely that a whole group of calculations will share parameters in common, and that these parameters need be the same. It is possible to examine the computational parameter for any number of jobs simultaneously:


In [16]:
%goto /
jobparams.structure.name


Out[16]:
{
  '/GaAs/':          'Zinc-Blende',
  '/diamond/':       'Zinc-Blende',
  '/diamond/alloy/': 'Zinc-Blende',
}

There are two things to note here:

  1. The return is an object that duct-types for dictionaries. The keys are the job-names and the values are the property of interest.
  2. It is possible to access attributes (here name) of attributes (here structure) to any degree of nesting. If the parameter of a given job does not contain the nested attribute, then that job is ignored.

We can set parameters much the same way:


In [17]:
jobparams.structure.name = 'hello'
jobparams.structure.name


Out[17]:
{
  '/GaAs/':          'hello',
  '/diamond/':       'hello',
  '/diamond/alloy/': 'hello',
}

By default, it is only possible to modify existing attributes, as opposed to add new attributes.

Finally, it is possible to focus on a specific sub-set of jobfolders. By default the syntax is that of a unix-shell. However, the syntax can be switched to regular exppressions via the Pylada parameter pylada.unix_re. Only the former syntax is illustrated here:


In [18]:
jobparams['*/alloy'].structure.name


Out[18]:
'hello'

Note that one only item is left in the dictionary, that item is returned directly. Indeed, there is only one job-folder which corresponds to "*/alloy". This behavior can be turned on and off using the parameters jobparams_naked_end and/or JobParams.naked_end. The unix shell-like syntax can be either absolute paths, when preceded with '/', or relative. In that last case, they are relative to the current position in the job-folder, as changed by %goto.

When the return looks like a dictionary, it behaves like a dictionary. Hence it can be iterated over:


In [19]:
for key, value in jobparams['diamond/*'].structure.name.items():
   print(key, value)


/diamond/ hello
/diamond/alloy/ hello

 Launching calculations

Turning job-folders on and off

Using jobparams, it is possible to turn job-folders on and off:


In [20]:
%goto /
jobparams['diamond/alloy'].onoff = 'on'
jobparams.onoff


Out[20]:
{'/GaAs/': 'on', '/diamond/': 'on', '/diamond/alloy/': 'on'}

When "off", a job-folder is ignored by jobparams (and collect, described below). Furthermore, it will not be executed. The only way to access it again is to turn it back on. Groups of calculations can be turned on and off using the unix shell-like syntax previously.

WARNING: You should always save the job-folder after messing with it's on/off status. This is because the computations will re-read the dictionary from disk.


In [21]:
%savefolders


Saved job folder to /Users/mdavezac/workspaces/pylada-light/src/pylada-light/notebooks/tmp/dummy.dict.

Submitting job-folder calculations

Once job-folders are ready, it takes all of one line to launch the calculations:

IPython
%launch scattered

This will create one pbs/slurm job per executable job-folder. A number of options are possible to select the number of processors, the account or queue, the walltime, etc. To examine them, do %launch scattered --help:


In [22]:
%launch scattered --help


usage: %launch scattered [-h] [--force] [--walltime WALLTIME]
                         [--prefix PREFIX] [--nolaunch] [--nbprocs NBPROCS]
                         [--ppn PPN] [--account ACCOUNT] [--feature FEATURE]
                         [--queue QUEUE] [--debug]
                         [FILE [FILE ...]]

A separate PBS/slurm script is created for each and every calculation in the
job-folder (or dictionaries).

positional arguments:
  FILE                 Optional path to a job-folder. If not present, the
                       currently loaded job-dictionary will be launched.

optional arguments:
  -h, --help           show this help message and exit
  --force              If present, launches all untagged jobs, even those
                       which completed successfully.
  --walltime WALLTIME  walltime for jobs. Should be in hh:mm:ss format.
                       Defaults to 00:30:00.
  --prefix PREFIX      Adds prefix to job name.
  --nolaunch           Does everything except calling qsub.
  --nbprocs NBPROCS    Can be an integer, in which case it specifies the
                       number of processes to exectute jobs with. Can also be
                       a callable taking a JobFolder as argument and returning
                       a integer. Will default to as many procs as there are
                       atoms in that particular structure. Defaults to
                       something close to the number of atoms in the structure
                       (eg good for VASP).
  --ppn PPN            Number of processes per node. Defaults to 4.
  --account ACCOUNT    Launches jobs on specific account if present.
  --feature FEATURE    Launches jobs on specific feature if present.
  --queue QUEUE        Launches jobs on specific queue if present.
  --debug              launches in interactive queue if present.

Most default values should be contained in pylada.default_pbs. The number of processors is by default equal to the even number closest to the number of atoms in the structure (apparently, this is a recommended VASP default). The number of processes can be given both as an integer, or as function which takes a job-folder as the only argument, and returns an integer.

Other possibilities for lauching jobs can be obtained as follows:


In [23]:
%launch --help


usage: %launch [-h] {scattered,interactive,asone,single} ...

positional arguments:
  {scattered,interactive,asone,single}
                        Launches one job per untagged calculations

optional arguments:
  -h, --help            show this help message and exit

In this notebook, we will be using %launch interactive since the jobs are simple and since we cannot be sure that pylada has been configured for PBS, Slurm, or other queueing systems.


In [24]:
%launch interactive


Working on GaAs/ in /Users/mdavezac/workspaces/pylada-light/src/pylada-light/notebooks/tmp/dummy.dict.
Working on diamond/ in /Users/mdavezac/workspaces/pylada-light/src/pylada-light/notebooks/tmp/dummy.dict.
Working on diamond/alloy/ in /Users/mdavezac/workspaces/pylada-light/src/pylada-light/notebooks/tmp/dummy.dict.

At this juncture, we should find that jobs have been created a number of output files in the directory where the file dummy.dict is located. You may remember from the start of this lesson that we loaded the dictionary with %explore /tmp/dummy.dict. The location of this file is what matters. The current working directory does not.


In [25]:
%%bash
[ ! -e tree ] || tree


.
├── GaAs
│   └── OUTCAR
├── diamond
│   ├── OUTCAR
│   └── alloy
│       └── OUTCAR
├── dummy.dict
└── jobA
    └── OUTCAR

4 directories, 5 files

You will notice that the job in alloy/diamond did not run since it is off. If you were to go back up a few cells and set it to on, and then rerun via %launch interactive, you should see that it will be computed.

We can now navigate using %goto, simultaneously through the jobfolder and the disk


In [26]:
%goto /diamond
print("current location: ", jobparams.current.name)


current location:  /diamond/

In [27]:
%%bash
[ ! -e tree ] || tree


.
├── OUTCAR
└── alloy
    └── OUTCAR

1 directory, 2 files

 Collecting results

The first thing one wants to know from calculations is whether they ran successfully:


In [28]:
%goto /
collect.success


Out[28]:
{
  '/GaAs/':          True,
  '/diamond/':       True,
  '/diamond/alloy/': True,
}

Our dummy functional is too simple to fail... However, if you delete any given calculation directory, and try it again, you will find some false results. Beware that some collected results are cached so they can be retrieved faster the second time around, so redoing %explore some.dict might be necessary.

Warning Success means that the calculations ran to completion. It does not mean that they are not garbage.

Results from the calculation can be retrieved in much the same way as parameters were examined. This time, however, we use an object called collect (still without preceding "%" sign). Assuming the job-folders created earlier were launched, the random energies created by our fake functional could be retrieved as in:


In [29]:
collect.energy


Out[29]:
{
  '/GaAs/':          0.16078907235078244,
  '/diamond/':       0.7704390379154654,
  '/diamond/alloy/': 0.5635669216988963,
}

What exactly can be collected this way will depend on the actual calculation. The easiest way to examine what's available it to hit collect.[TAB]. The collected results can be iterated over, focussed to a few relevant calculations, exactly as was done with jobparams. The advantage is that further job-folders can be easily constructed which take the calculations a bit further. For instance, we have created job-folders which minimize spin-polarized crystal structures. Then a second-wave of job-folders would be created from the resulting relaxed crystal structures to examine different possible magnetic orders.